-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for CRLF's #1
base: master
Are you sure you want to change the base?
Conversation
If the next byte after a carriage return (CR) is a line feed (LF) then don't replace the CR. The result will be a CRLF which is supported by the Go CSV reader.
@@ -27,7 +27,7 @@ func New(r io.Reader) io.Reader { | |||
func (r reader) Read(p []byte) (n int, err error) { | |||
n, err = r.r.Read(p) | |||
for i, b := range p { | |||
if b == rByte { | |||
if j := i + 1; b == rByte && ((j < len(p) && p[j] != nByte) || j == len(p)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, I think this won't work in all cases. Read()
is allowed to return less than len(p)
bytes, even if it's not the end of the underlying Reader. So it can happen that a \r
falls on the last byte of a Read()
call, then the \n
is the first byte of the next Read()
call.
You can test this by creating your own Reader
implementation to pass into this for testing which simulates this unfortunate case.
One fix is to require that this reader take a bufio.Reader
as input rather than a io.Reader
so that you can always call Peek()
to get the next byte (I think that works).
Another fix could be to continue taking io.Reader
and then test whether it can be cast in bufio.Reader
and if not manually wrapping it.
Sorry for the delayed response. Could you explain the motivation behind this PR? The original motivation for the package is that the current Mac version of Excel outputs CSV files that the standard library CSV package can't read. Is there a program that outputs CSV files that this PR can parse correctly but the master branch can't? |
I'm encountering this now myself. The issue is if you do not know the exact newline char sequence your CSV has. For example in user generated content there could be basically any of (\r, \n, \r\n) as well you may or may not want to preserve the values within each field. Eg:
==>
|
@ctessum I don't think there is any issue in the master branch but when we use this reader, it'll create extra blank lines because of @aboodman, I was thinking about the same fix that you did but didn't know about the point you mentioned in the comment. So is there any workaround for that issue? How did you fix that? I'm not sure how to cast my |
So the original goal of this package was to be able to read CSV files created by the Mac version of Excel. I guess my question is: does the Mac version of Excel ever output CRLF's? If it does, then it would be within the scope of this package to handle them, but if not, then this change or any related change would be out of scope. |
In the current code, when a CRLF is encountered the result will be a LFLF. The Go CSV Reader will handle an unquoted LFLF by ignoring the empty line, but if the CRLF occurs inside of a quoted string then the LFLF will be maintained.
This change accounts for that by ignoring a CR that is followed by a LF. The resulting CRLF is properly handled by the CSV Reader which strips the preceding CR: "Carriage returns before newline characters are silently removed." (source).